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Abstract 

One of the most fundamental problems in 
large-scale network analysis is to determine the 
importance of a particular node in a network. 
Betweenness centrality is the most widely used 
metric to measure the importance of a node in a 
network. In this paper, we present a randomized 
parallel algorithm and an algebraic method for 
computing betweenness centrality of all nodes in 
a network. We prove that any path-comparison 
based algorithm cannot compute betweenness 
in less than 0{nm) time. 

Keywords: all-pairs shortest paths, between- 
ness centrality, lower bounds, parallel graph al- 
gorithms, social networks. 

1 Introduction 

One of the most fundamental problems in large- 
scale network analysis is to determine the im- 
portance of a particular node (or an edge) in 
a network. For example, in social networks we 
wish to know agents that have very short con- 
nections to large portions of the population. In 
communication networks we wish to know the 
links that carry a lot of traffic, ISPs that at- 
tract a lot of business, links that, if disconnected, 
decrease network performance dramatically, and 
so on. A particular way to measure the impor- 
tance of network elements (nodes or edges) is us- 
ing centrality metrics such as closeness centrality 
[29] . graph centrality [19], stress centrality [3lJ 
and betweenness centrality ( [15] j [2] ) • An impor- 
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tant application of centrality arises in the study 
epidemic phenomena in networks when an infec- 
tious disease or a computer virus is disseminated. 
The power of a node to spread the epidemic is 
related to its centrality [28]. Centrality metrics 
also find applications in natural language pro- 
cessing [14J, to compute relative importance of 
textual units. 

Betweenness centrality (introduced by Free- 
man [TB] and Anthonisse [2]) is the most pop- 
ular (and computationally expensive) centrality 
metric. Some recent applications of between- 
ness include the study of biological networks 
[201 [261 US], study of sexual networks and AIDS 
[24] . identifying key actors in terrorist networks 
[221110]. organizational behavior [6j, supply chain 
management and transportation networks 
[T8] . Betweenness can also be used as a heuristic 
to solve NP-hard problems like graph clustering. 
For example, Newman and Girvan [25] developed 
a heuristic to find community structure in large 
networks, based on betweenness of the edges of 
the network. 

Since the networks of interest are huge, it is 
important to develop algorithms that compute 
these metrics efficiently. Brandes [1] showed that 
betweenness centrality can be computed in the 
same asymptotic time bounds as n Single Source 
Shortest Path (SSSP) computations. Brandes 
and Pich [5] presented experimental results of 
estimating different centrality measures under 
various node-selection strategies. Eppstein and 
Wang [13] presented a randomized approxima- 
tion algorithm for closeness centrality. 



1 



1.1 Betweenness Centrality 

We denote a network by an undirected graph 
G(y,E), with vertex set {vi,V2, ■ ■ ■ ,Vn} (or 
{1, 2, . . . , n}), with \V\ = n vertices and \E\ = m 
edges, representing the relationships between the 
vertices. In this paper, we refer to connected 
undirected graphs, unless otherwise stated. Each 
edge e £ E has a positive integer weight w{e). 
Unweighted graphs have w{e) = 1 for all edges. 
A path from s to t is defined as a sequence of 
edges {vi,Vi+i), < i < I, where vq = s and 
vi = t. The length of a path is the sum of weights 
of edges in this sequence. We use d{s,t) to de- 
note the distance (the minimum length of any 
path connecting s and t in G) between vertices s 
and t. We set d{i, i) = hy convention. We de- 
note the total number of shortest paths between 
vertices s and t by A^t = ^ts- We set Xss = 1 
by convention. The number of shortest paths 
between s and t, passing through a vertex v, is 
denoted by Xstiv)- Let Diam{G) be the diam- 
eter (the longest shortest path) of the graph G. 
Let A = (aij) be the adjacency matrix of the 
graph, i.e., A is a 0-1 matrix with a^j = 1 iff 

{i,j)eE. 

Let 6stiv) denote the fraction of shortest 
paths between s and t that pass through a partic- 
ular vertex v i.e., 6stiv) = ^X7^' ^^^^ ^st{v) 
the pair-dependency of s,t on v. Betweenness 
centrality of a vertex v is defined as 

BC{v)= 

The dependency of a source vertex s G ^ on a 
vertex v £ V is defined as 

The betweenness centrality of a vertex v can 
be then expressed as 

BGiv) = 

Define the set of predecessors of a vertex v 
on shortest paths from s as Ps{v) = {u £ V : 



(n, v) E E^ d{s, v) = d{s, u) + w{u, v)}. The fol- 
lowing theorem, states that the dependencies of 
the closer vertices can be computed from the de- 
pendencies of the farther vertices. 

Theorem 1.1. ^ The dependency of s £ V on 
any v G V obeys 

_ , , ^sw 
w.vdFs (w) 

Brandes's Algorithm [4J is based on the above 
theorem. First, n single-source shortest paths 
(SSSP) computations are done, one for each 
s £ V. The predecessor sets Ps{v) are main- 
tained during these computations. Next, for 
every s £ V, using the information from the 
shortest paths tree and predecessor sets along 
the paths, compute the dependencies 5s*{v) for 
all other v G V . To compute the between- 
ness value of a vertex v, we finally compute the 
sum of all dependency values. The 0{n'^) space 
requirements can be reduced to 0{n + m) by 
maintaining a running centrality score. Note 
that the centrality scores need to be divided by 
two if the graph is undirected, since all shortest 
paths are considered twice. Brandes's Algorithm 
runs in 0[nm) time for unweighted graphs and 
0{nm + logn) time for weighted graphs. 

1.2 Our Results 

Brandes's algorithm is a path-comparison based 
algorithm. We prove that any path-comparison 
based algorithm cannot compute betweenness in 
less than 0{nm) time. Betweenness centrality 
is closely related to All Pairs Shortest Paths 
Problems (APSP) and algebraic methods have 
been very successful in obtaining better run- 
ning times for APSP ([30], [I], [32], [l5], [H], 
[34j). We present an algebraic method for com- 
puting betweenness centrality of all nodes in a 
network. For unweighted graphs, our algorithm 
runs in time 0{n^ Diam(G)) , where lo < 2.376 
is the exponent of matrix multiplication and 
Diam(G) is the diameter of the graph. For 
weighted graphs with integer weights taken from 
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the range {1,2,..., M}, we present an algorithm 
that runs in time 0{Mn'^ Diam{G)) . As in [1], 
our time bounds are true in the model where all 
arithmetic operations (independent of size of the 
numbers) take unit time and numbers use unit 
space. Recent observations, on real-world graph 
evolution, such as densification and shrinking di- 
ameters [23], make our algorithms very relevant 
to the real-world graphs. 

We present a randomized parallel algorithm 
for computing betweenness centrality of all 
nodes in a network. Our approach is based 
on the randomized parallel SSSP algorithm 
for unweighted graphs is given by Ullman and 
Yannakakis [33j. We compute the between- 
ness in two stages (which we call the for- 
ward pass and the backward pass). Our algo- 
rithm for forward pass runs in 0{n) time us- 
ing 0(m log n) processors for unweighted graphs 
and 0(n log^ n log M) time using 0{m) proces- 
sors for weighted graphs with integer weights 
taken from the range {1, 2, . . . , M}. Our back- 
ward pass algorithm runs in 0{v?) time us- 
ing 0{n) processors for both weighted and un- 
weighted graphs. For bounded-degree graphs, 
we present an optimal backward pass algorithm 
that runs in 0(n log m) time using 0{m) pro- 
cessors for unweighted graphs and 0{Mnlogm) 
time using 0{m) processors for weighted graphs. 

2 Lower Bounds 

Definition 2.1. A Path-COMPARISON based 
Algorithm [llj.- A Path- comparison based Al- 
gorithm A accepts as input a graph G and a 
weight function. The algorithm A can perform 
all standard operations. However, the only way 
it can access the edge weights is to compare the 
weights of two different paths. 

Karger, Koller and Phillips [llj established 
that VL{n^) is a lower bound on the complex- 
ity of any path-comparison based algorithm for 
the all-pairs shortest path problem on a graph 
with G(n^) edges. They conjectured that simi- 
lar lower bounds hold for undirected graphs also. 



We use their construction to derive lower bounds 
on computing betweenness in directed graphs. 
For the details of the construction we refer the 
reader to [TT] . 

The graph G, they constructed, is a directed 
tripartite graph on vertices lij, Vj and Wk where 
i, j and k range from to n — 1. The edge 
set for G is {(tii,Uj)} U {{vj.,Wk)}. Therefore, 
the only paths are individual edges and paths 
{ui,Vj,Wk) of length two. A weight function W 
is properly chosen so that the unique shortest 
path from Ui to Wk goes through vq. Note that 
the betweenness of the node vq is n^. Let A be 
any path-comparison-based algorithm. Consider 
giving (G, W) as input to A, and suppose that 
A runs correctly. It must therefore output as 
the betweenness of vq based on the set of optimal 
paths L. Suppose further that a particular path 
p* = {ui* , Vj* , Wk* ) was never one of the operands 
in any comparison operation which A performed. 
The weight function can be suitably modified (as 
in [11] ) to W in which p* is the unique shortest 
path from ui* to w^* , but the ordering by weight 
of all the other paths remains the same. Note 
that the centrality of vq decreases with the new 
weight function W' . If we run A on (G, W), all 
path comparisons not involving p* give the same 
result as they did using W . Therefore, since A 
never performed a comparison involving p* while 
running on W , we deduce that it still outputs n^, 
which is now incorrect. The following theorem 
is immediate. 

Theorem 2.2. There exists a directed graph of 
3n vertices on which any path- comparison based 
algorithm for betweenness must perform at least 
r? jl path weight comparisons. 

A similar argument can be used to show an 
fl{nm) lower bound on graphs of m edges. As- 
sume without loss of generality that m > 4n and 
that 2n divides m. We perform the same con- 
struction, but of the middle vertices we use only 
vi, . . . , v^/2n, connecting each of them to all the 
vertices Ui and Wk. This requires m edges and 
creates mn/2 paths. 
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Theorem 2.3. There exists a directed graph 
with 2n + m/2n vertices and m edges, on 
which any path- comparison-has ed algorithm for 
betweenness must perform at least mn/2 path 
weight comparisons. 



Conjecture : Computing betweenness of a 
single vertex is at least as hard as computing 
betweenness of all vertices. 



We make the following conjecture for comput- 
ing betweenness centrality in general graphs. If 
our conjecture is true, then the existing tech- 
niques for APSP provide lower bounds for com- 
puting betweenness. 



Conjecture : Computing betweenness of 
all vertices is at least as hard as computing 
all-pairs shortest distances. 



3 An Algebraic Method 

We denote matrices by upper case letters and 
the elements of a matrix by the corresponding 
lower case letter. Recall that A is the adja- 
cency matrix of the graph. Let 0„xn be an nxn 
zero-matrix. Let I^xn be an nxn identity ma- 
trix. Let D be an n X n matrix of distances, i.e., 
dij = d{i,j). Let Di be a 0-1 matrix such that 
idi)ij = 1 iff d{i,j) = I. Let A be an n x n ma- 
trix, where Xij is the number of shortest paths 
between i and j. Let A be an n x n matrix 
of dependencies, i.e., 6ij = Let A; be a 

matrix such that {6i)ij is non-zero and equal to 
iff d{i,j) = I. If X and Y are two matri- 
ces, we let X mult Y {X div Y) be the matrix 
obtained by element-wise multiplication (divi- 
sion) of the matrices X and Y . We let X-Y de- 
note the product of the two matrices X and Y , 
i.e., [X-Y)ij = Ylk^ikUkj- We call the computa- 
tion of the distance and the number of short- 
est paths (between all pairs) as the forward 
pass, since shortest paths are computed using 



BFS/Dijkstra's algorithm. The computation of 
dependencies is called the backward pass, since 
dependencies are computed in a bottom-up fash- 
ion. In other words, the matrices D and A are 
computed in the forward pass and the matrix A 
is computed in the backward pass. 

3.1 Unweighted Graphs 
3.1.1 Forward Pass 

The lengths of all shortest paths can be com- 
puted using the following theorem of Seidel [3(3 • 

Theorem 3.1. !130^ All-pairs shortest distances 
for undirected unweighted graphs can be com- 
puted in time 0{n^ \og{Diam{G))). 

We compute the number of shortest paths (Ajj 
for all i, j) using the following algorithm : 

ComputePathCount(74) 

Initialize Z to Inxn 
Initialize A to I^xn 
Initialize Ap^g^ and A^^^^^ to O^xn 

for / <— 1 to Diam{G) 
Z ^ Z-A 
for i,j <— 1 to n 

if {^y^prev)ij ^ 
(Acurr)jj ^ 

else 

A ^ A -\- ^CUTT 
^prev ^ ^curr 

for i ^ 1 to n 

Am ^ 1 

return A 

Correctness : Note that Z = A^ after l^^ iteration 
of the main for loop. Let = {a[j). It is easy 
to see that a[j equals the number of paths (not 
necessarily shortest) from i to j of length exactly 
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zero, represents the number of shortest paths 
from i to J, of length exactly I. The first time 
we encounter a non-zero value of a\j, we store 
the value in Acurr and eventually in A. Also, we 
make sure that these values are not overwritten 
in the future iterations. In the end we set all Xu 
to 1 by convention. Hence the above algorithm 
correctly computes the number of shortest paths, 
for all pairs, in an undirected unweighted graph. 
As a consequence we get the following lemma : 

Lemma 3.2. All-pairs shortest path counts for 
undirected unweighted graphs can be computed in 
time 0{n'^Diam{G)). 

3.1.2 Backward Pass 

Lemma 3.3. // d{i,j) = Diam{G), then 

= = 0. Hence ^Diam{G) = Onxn- 

Lemma 3.4. For unweighted graphs, if I = 
Diam{G) then A/_i = (D; div K)-A. 

Proof. We have the following cases : 
Case I : d{i,j) = 1 — 1 



Case II : d{i, j) < I - 1 



El {di)ik , 



fe=i 



E 



ik 



k:akj=l,{di)ik=l 

E 

k:akj=l,d{i,k)=l 





\k 



{di) 



ik 



Hk 



Since if d{i, j) < l — l, $ k such that d{i, k) = I 
and akj = 1. 



Case III : d{i,j 
In this case, 

^ {di)ik 

■akj 

k=l 



E 



it is easy to see that 



0. □ 



E 

k=l 



(d, 



i)ik 



X. 



ik 



■O-kj 



E 

fe:afcj=l,(d;)ii.=l 



I ik 



, E 

d{i^k^=l^aizj=l 

f {di)ik 

k:jePi{k) 



\k 

{di)ik 



■a-kj 



X. 



ik 



\k 



■Ckj 



Lemma 3.5. For unweighted graphs if I < 
Diam{G) then A;_i = ((D; + A;) div {k))-A. 

Proof. This can be proved by induction using the 
previous lemma as the base case, and the argu- 
ment is similar to the proof for unweighted trees. 
In addition we use the fact that shortest path 
trees have no cross edges (i.e., all the edges of 
BFS tree join vertices of levels that differ at most 
by one). Hence, the dependencies computed at 
distance I — 1 uses only the dependencies at dis- 
tance I. □ 



= E 



k:jePi(k) 



1-) 

KkJ 



k:jePi(k) 
Si*{j) 



Note that we have used the fact that, if 
d{i, k) = I = Diam{G) then = 0. 



ComputeDependency (^, D, A) 

Initialize A to Onxn 
Initialize ADiam{G) to nxn 
for I Diam{G) to 1 

Construct a 0-1 matrix D^, such that 

(ri;),, =liff d{i,j)=l. 

A,_i <- ((A + A;) div {K))-A 
Ai_i ^Mask(A,_i,/-l) 
A;_i <— A;_i mult A 
A ^ A + A,_i 
return A 

Mask(X, I) 

for all < i, J < n 
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Xij < 0. 
return X 



Prom the previous lemma, it is easy to see 
that the above algorithm runs in 0{n'^ Diam[G)) 
using O(n^) space. Once the dependencies are 
computed, the centrality of each node can be 
computed by adding the corresponding depen- 
dencies, in 0{n?) time. 

Theorem 3.6. The betweenness of all vertices of 
an undirected unweighted graph G, can he com- 
puted in time 0{n^ Diam{G)) . 

3.2 Weighted Graphs 
3.2.1 Forward Pass 

We make use of a well-known reduction from 
APSP to the computation of the distance prod- 
uct (also known as the min-plus product) of two 
n X n matrices. 

Definition 3.7. Distance Products.- LetX, 
Y be n X n matrices. The distance product of X 
and Y, denoted X-kY, is annxn matrix Z such 
that 

Zij = minl^^{xik + Vkj}, for 1 <i,j <n. 

It is well-known that the distance product of 
two n X n matrices, whose elements are taken 
from the set {-M, . . . , 0, . . . , M} U {+oo}, can 
be computed in time 0{Mn^). Combining the 
distance products with our observations for un- 
weighted graphs we get the following theorem. 

Theorem 3.8. All-pairs shortest distances 
and number of shortest paths for undirected 
weighted graphs with integer weights taken 
from {1,2,...,M} can he computed in time 
0{Mn^Diam{G)). 

The lengths of all shortest paths can also 
be computed by the following theorem of Alon, 
Galil, Margalit [I]. 



Theorem 3.9. JJ] All-pairs shortest distances 
for undirected weighted graphs with integer 
weights taken from {1,2,...,M} can he com- 
puted in time 0{Mn^). 

3.2.2 Backvi^ard Pass 

Let D, Di, A, A;, A be the matrices as defined 
earlier. Let A* be a 0-1 matrix with a*j = 1 iff 
w{i,j) = d{i,j). In other words, a*j = 1 iff the 
edge participates in the shortest paths. 

Theorem 3.10. ComputeDependency 

correctly computes the dependencies in a 
weighted graph with integer weights taken from 
{1, 2, . . . , M} in time 0{Mn'^ Diam{G)). 

Proof. Follows from the correctness of the algo- 
rithm for unweighted graphs. □ 

4 A Randomized Parallel Algo- 
rithm 

We assume a model of parallel computation 
cahed OR CRCW PRAM [3], in which multiple 
processors can simultaneously read and write to 
a shared memory. If multiple processors attempt 
to write multiple values to a single location, the 
value written is the bitwise OR of the values. 
The most elementary parallel SSSP algorithm is 
parallel breadth-first search, in which the nodes 
are visited level by level as the search progresses. 
Level consists of the source. The problem with 
this approach is that the time required grows 
linearly with the number of levels traversed. To 
keep the time small Ullman and Yannakakis [3^ 
use A:-limited search. 

The size of a path is the number of nodes in the 
path and the minimum path-size is the shortest 
distance measured in number of nodes traversed. 
A k-limited shortest path from s to t is a path 
from s to t that is no longer than any s-io-t path 
of size at most k. To find fc-limited shortest paths 
in unweighted graphs we can run k iterations of 
parallel BPS. We call this k-limited breadth-first 
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search. The work required by a parallel algo- 
rithm is defined to be the product of time and 
number of processors required; this corresponds 
to the time that would be required if the parallel 
processors were all simulated by a single proces- 
sor. If the work of a parallel algorithm is equal 
to the time required by a sequential algorithm for 
the same problem, then the parallel algorithm is 
said to be optimal. 

In the following sections, we present parallel 
algorithms for the forward and backward passes. 
Forward pass consists of computating the dis- 
tance and the number of shortest paths (between 
all pairs). Backward pass involves computing 
the dependencies. Once the dependencies are 
known, to compute the betweenness value of a 
vertex v, we can simply compute the sum of all 
the dependencies for each vertex. This can be 
done in time 0(n log n) time using 0(n) proces- 
sors. 

4.1 Forward Pass 

4.1.1 Unweighted Graphs 

Ullman and Yannakakis's algorithm [33] for par- 
allel BFS, uses fc- limited search using random 
sampling of distinguished vertices based on 
the following well known observation (see, e.g., 
Greene and Knuth [IT]). Their algorithm uses 
about y^logn distinguished nodes, and there- 
fore needs to search forward for about ^/n dis- 
tance from each distinguished node. Our algo- 
rithm for parallelizing the forward pass is based 
on their technique. 

Theorem 4.1. Jj7| / Given a path of length k in 
a graph, a random sample of vertices will 

have at least one vertex belonging to the path with 
probability 1 — 

Theorem 4.2. With high probability, Algo- 
rithm 1 computes correctly the shortest paths 
from the source s to all the other nodes in V. The 
parallel global time 0{^/n) using mlogn proces- 
sors. 



Proof. Given any v G V, let be an arbitrary 
shortest path from s to v. From Theorem 5.1, 
with high probability, each subpath of of size 
^/n contains at least a node x £ S. Hence, Py 
can be seen as a sequence of subpaths of size not 
larger than -^/n, whose extremal nodes belonging 
to S (except for the last node v). Such subpaths 
are computed in the y^-limited search in Step 
2. Thus, the shortest path from s to the last S- 
vertex x in Py is correctly computed in Step 4 
and the shortest path from the latter to node v 
is correctly computed in Step 2. The -yn-limited 
search, in Step 2, can be performed in 0{^/n) 
time using using mlogn processors. The total 
work of Step 4 is 0{{^/n)^ log n) and can be done 
in 0(\/n) using mlogn processors. Correctness 
of the number of shortest paths follows. □ 

Since we need the distances and number of 
shortest paths between all pairs of vertices, 
we can simply run the above algorithm for n 
times, once for each source vertex. This ap- 
proach duplicates many computations. Since we 
choose 0(-v/nlogn) distinguished nodes, we can 
compute the shortest path distances from each 
of these distinguished nodes (treating them as 
source nodes), with a single run of Algorithm 
1. The following theorem states that we need to 
run the algorithm for only 0{^/n) times. This 
results in an optimal parallel algorithm (modulo 
log-factors) for the forward pass. 

Theorem 4.3. With high probability, Algo- 
rithm 1 is run only 0{y/n) times to compute 
all-pairs shortest distances and number of short- 
est paths. 

Proof. Let us say, we run the Algorithm 1 

independently for k times. Each time the al- 
gorithm picks y^logn vertices. Then the proba- 
bility that a vertex v £ V is not picked in any of 
these iterations is given by 



Pr [v not picked] = I 1 — 




ky/n-logn 

< e n 
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Choosing k = Cy/n, for some constant c > 0, we 
get 

c%/ny/n-logn 

Pr [v not picked] < e ^ 

— g-clogn 

^—c'lnn 

_ 1 

Hence the probabihty that a vertex v £ V is 
not picked in any of the 0{^/n) iterations is very 
smah, inverse polynomial in n. □ 

Theorem 4.4. With high probability, we 
can compute the D and A matrices for an 
unweighted graph in 0{n) time using 0{m log n) 
processors. 



Algorithm 1 : 

Input : An undirected graph G{V,E), a source 
seV. 

Output : d{s, v) and A^^ for all v £ V. 

1. Choose uniformly at random a subset S of 
V, together with s; the size of S must be 
B(v^log n). 

2. From any x £ S perform, in parallel, a ^/n- 
limited search, generating the shortest path 
P'x,v from X to every node v £ V. 

3. An auxiliary weighted graph H is computed 
on the vertex set S, where the weight of an 
edge is defined to be the length computed 
by the previous y^-limited search. 

4. Compute the all-pairs shortest paths Px^y in 
H, with no-limited search. 

5. The shortest distance d{s,v) = \Pv\, from 
s to a node v £ V, is computed in the 
following way: 

Pv = Ps,min\^Pmin,v 

where min is a vertex in H for which: 

|-fs,r?im| ~l~ l-^mm,!)! '^'^^x(;iH{\Ps,x \ ~l~ l-^a;,!;!} 



6. The number of shortest paths \sv, can be 
computed by counting the number of such 
min nodes. 



4.1.2 Weighted Graphs 

Ullman and Yannakakis's approach cannot be di- 
rectly applied to weighted graphs, indeed there is 
no apparent way to perform efficiently the ^/n- 
limited search, especially when the weights are 
large. On the other hand, it is easy to ver- 
ify that the remaining steps of Algorithm 1 
works also for weighted graphs, thus the cru- 
cial problem is to find a weighted version of the 
-y/n-limited search. A useful method for solv- 
ing optimization problems which involve numer- 
ical inputs is to uniformly shrink all weights; 
but this, in itself, is not sufficient since the 
search is strongly based on the fact that weights 
are integers. Klein and Subramanian [21J pro- 
posed a \/n-limited search for weighted graphs 
which uses the integer shrinking together with 
the well-established technique, due to Ragha- 
van and Thompson [27], for rounding weights 
without changing their sums "too much". The 
key idea is that a non-integral value is rounded 
up or down according to a probability function 
which reflects how close the value is to the next 
higher integer and next lower one. By applying 
this approach to the basic techniques of Ullman 
and Yannakakis, Klein and Subramanian pro- 
vided a randomized parallel algorithm for SSSP 
in weighted graphs. Their algorithm runs in 
0{^/nlog'^ n log M) time and using 0{m) proces- 
sors to compute an SSSP tree. We enhance their 
algorithm to compute aW-pairs shortest paths 
(and number of shortest paths). The modifica- 
tions needed are similar to those presented in the 
previous section. We mention our main theorem 
here. 

Theorem 4.5. With high probability, we can 
compute the D and A matrices for a weighted 
graph, with integer weights taken from the range 
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{1,2,...,M}, in 0(n log^ n log M) time using 
0{m) processors. 

4.2 Backward Pass 
4.2.1 General Graphs 

After the forward pass is performed, we may as- 
sume that the matrices D and A are available 
in the shared memory. The following algorithm 
computes the betweenness centralities (without 
actually computing the dependencies) in O(n^) 
time using 0{n) processors. 



Algorithm 2 : 

Input : D and A matrices. 

Output : Betweenness centrality [BC{v)) of all 
vertices. 

Let n processors represent the vertices. 
Each processor maintains a running centrality 
score BC{v), initialized to zero 
For each pair of vertices s,t S V, processor v 
{v ^ s ^ t) does the following : 
if d{s, t) = d(s, v) + d{v, t) 

BC{v) + = ^^^^ 
else 

BC{v) + = 



4.2.2 Bounded Degree Graphs 

In this section we present a faster parallel 
algorithm for backward pass in bounded- degree 
graphs. Backward pass involves computing the 
dependencies (i.e., computing the matrix A). 
Recall the following lemma. 

Lemma 3.3 : If d{i,j) = Diam{G), then 
k*{3) = = 0. 

Brandes's theorem ( Theorem states that 
the dependencies of the closer vertices can be 
computed from the dependencies of the farther 
vertices. The following algorithm (Comput- 
eDependency) uses this fact (and a small trick) 
to compute the dependencies in parallel. The 
main idea behind the algorithm is to compute 



dependencies of pairs of vertices (taking a maxi- 
mum of n/2 pairs) which are at distance d. Dis- 
tance d is decreased from n to 1. 



ComputeDependency(A, D, A) 

For d ^ n to 1 

Let Vd = { V eV -.3 u eV with d{u, v) = d } 
While \Vd\^0 

Select a maximum of n/2 pairs of vertices 
(with no two pairs having a common vertex) 
from Vd such that each pair is at a distance d 
from each other. Let be such a set. 
Vd^Vd\ 

A = ParalleICompute(yl, D, A, Fj) 

return A 



Correctness : ParallelCompute computes the 
dependencies of (at most n/2 pairs of) vertices 
(such that each pair of vertices are at a distance 
of d from each other) in parallel. This can be 
done in 0(logA;) time, since this involves com- 
puting sum of k values. When there are mul- 
tiple vertices at distance d (from a vertex v) 
the algorithm is repeated until all the pairs's de- 
pendencies are calculated. Note that there can 
be at most 0{maxdeg{G)) such nodes, where 
maxdeg{G) is the maximum degree of any vertex 
in the graph. Hence ParallelCompute takes 
0{maxdeg{G) log k) = 0{logm) time (since we 
are interested in bounded- degree graphs). Since 
there are at most n different distances and O(n^) 
pairs of dependencies to be computed, Parallel- 
Compute is called at most 0(n) times. The 
constant in 0{n) depends on the distribution of 
(n possible) distances among the 0{n'^) pairs of 
vertices. Note that this gives an optimal algo- 
rithm. 



ParallelCompute(A, D, A, V^) 

Let m processors represent the edges. 

For each pair u,v G (such that d{u, v) = d) 

do the following in parallel 

o Let , z«2 , It's , • • • , iL'fc be the vertices such 
that V G Pu{wi). 
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o The processor representing edge {v,Wi) 
calculates T^i^ + Su*(wi)). 

o The k processors (representing the edges 

k 

{v,Wi)) compute the sum ^^(1 + Su*{wi)). 

i=l 

o This sum is multiplied by A^^ and stored in 
the shared memory as 6uv 
o Compute 5vu similarly. 

o If there are multiple vertices at distance 
d from V then repeat the algorithm Parallel 
Compute for the remaining pairs of vertices, 
return A 



Theorem 4.6. The dependencies in an 
unweighted graph can be computed in 0{n log m) 
time using 0{m) processors. 

For weighted graphs with integer weights 
taken from the range {1,2,..., M}, the distances 
vary from nM to 1. 

Theorem 4.7. The dependencies in a weighted 
graph with integer weights taken from the range 
{1, 2, . . . , M}, can he computed in 0{Mn\ogm) 
time using 0{m) processors. 

5 Open Problems 

1. Is there an algorithm to compute (exactly 
or approximately) the betweenness of all (or 
even top k) vertices in sub-cubic (or o{mn)) 
time ? 

2. Since the networks of interest are huge 
and dynamic, it is expensive to recompute 
betweenness for every addition/deletion of 
edge. Is there a fully dynamic algorithm 
to maintain betweenness in O(n^) amortized 
time per update (edge insertion or deletion), 
using only O(n^) space. Here, it is crucial 
to observe that betweenness centrality of all 
vertices can be changed by deleting (hence 



adding) a single edge to the graph. For ex- 
ample, let Cik+i be a cycle on 4/c + 1 ver- 
tices. The centrality of any vertex in C^k+i 
is k"^ . Removing an edge from C^k+i results 
in a path P^k+i on 4A; + 1 vertices. Between- 
ness of vertices of Pa^u+i are 0, 4/c — 1, 2(4A; — 
2),...,4A:2,...,2(4A;-2),4A;-1,0. 

3. Betweenness centrality implicitly assumes 
that communications in the network use 
shortest paths. Shortest paths are sensi- 
tive to local changes (addition/deletion of 
edges). One possible way to address this is- 
sue is to consider (5-stretch paths, instead of 
shortest paths [7] . A J-stretch path is a path 
from s to t of length < (1 + (5)d(s, t). What 
is the complexity of computing betweenness 
based on 5-stretch paths ? 

4. Our conjectures mentioned in Section [2] are 
open. 
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